(3) Logs Analysis using Mining.

Data set type: Industial-Anoki

💨🔥💨 Smoke Analysis

✅ python, ✅ Gitlab, ✅ Mongodb

Qs

Q1: What is the top of most common problems in pipelines?

Index

Nomenclature

- (STPS) Smoke test possible solution: It is the set of tentative errors that can be avoided by using smoke tests

Import python libraries

Config Variables

Page reference

Get data from MongoDB

Read data from Mongodb database

Delete pipelines data of the analysis with the name inside of the BlackList

Check whit @Leo

This is to avoid including logs from other types of tests in the analysis. A logs of a functional test (to mention an example) could throw large volumes of events from other types of tests

Analysis of data volumes.

Percentage of type jobs

Number of fails by stage number.

Test. Measuring Similarity Between Texts in Python

https://sites.temple.edu/tudsc/2017/03/30/measuring-similarity-between-texts-in-python/

Filter logs data

Get fragment of text with error

Test Filter for search the text error inside of the logs

Apply filter to all data

Remove stopwords

Exploratory analysis

In Python, one of the structures that most facilitates exploratory analysis is the Pandas DataFrame, which is the structure in which the information from the df is now stored. However, when tokenizing, there has been a major chandfdfdf Before dividing the text, the study elements were the df, and each one was in a row, thus fulfilling the condition of tidy data: an observation, adfrdfwdf When performing the tokenization, the element of study has become each token (word), thus violating the condition of tiddf ddftadf To get back to the ideal structure, each token list has to be expanded, doubling the value of the other columns as many times as ndfcesdfarydf This process is known as expansiondfor udfnestdf

Although it may seem an inefficient process (the number of rows increases a lot), this simple change facilitates activities of the type: grouping, counting, graphics dfdfdffdffdf

Total words used by each log event

Total words used by each project

Frequency of words

Create event list

In this section we try to get the list of STPS.

Group similar text